AITopics | user turn

Collaborating Authors

user turn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal

Liu, Yuhan, Zhang, Michael J. Q., Choi, Eunsol

arXiv.org Artificial IntelligenceOct-7-2025

Once language models (LMs) are deployed, they can interact with users long-term, ideally evolving based on their feedback. Asking for direct user feedback can be disruptive; thus, we study harvesting implicit user feedback from user-LM interaction logs. We study two user-LM interaction datasets (WildChat and LMSYS). First, we analyze user feedback in the user-LLM conversation logs, providing insights into when and why such feedback occurs. Second, we study harvesting learning signals from such implicit user feedback. Specifically, we study whether incorporating the contents of user feedback (e.g., user wanted clarification), in addition to the polarity of the feedback, can improve the model performance. We observe mixed results, showing this helps in short human-designed questions (MTBench) but not on longer and more complex questions (WildBench). Together, we provide an in-depth study of implicit user feedback, showing its potential and limitations.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.23158

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Reading Between the Lines: Scalable User Feedback via Implicit Sentiment in Developer Prompts

Nam, Daye, Salawa, Malgorzata, Chandra, Satish

arXiv.org Artificial IntelligenceSep-24-2025

Evaluating developer satisfaction with conversational AI assistants at scale is critical but challenging. User studies provide rich insights, but are unscalable, while large-scale quantitative signals from logs or in-product ratings are often too shallow or sparse to be reliable. To address this gap, we propose and evaluate a new approach: using sentiment analysis of developer prompts to identify implicit signals of user satisfaction. With an analysis of industrial usage logs of 372 professional developers, we show that this approach can identify a signal in ~8% of all interactions, a rate more than 13 times higher than explicit user feedback, with reasonable accuracy even with an off-the-shelf sentiment analysis approach. This new practical approach to complement existing feedback channels would open up new directions for building a more comprehensive understanding of the developer experience at scale.

large language model, natural language, sentiment, (16 more...)

arXiv.org Artificial Intelligence

2509.18361

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.57)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.57)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction

Zeng, Xingshan, Liu, Weiwen, Wang, Lingzhi, Li, Liangyou, Mi, Fei, Wang, Yasheng, Shang, Lifeng, Jiang, Xin, Liu, Qun

arXiv.org Artificial IntelligenceAug-19-2025

Agentic task-solving with Large Language Models (LLMs) requires multi-turn, multi-step interactions, often involving complex function calls and dynamic user-agent exchanges. Existing simulation-based data generation methods for such scenarios rely heavily on costly autoregressive interactions between multiple LLM agents, thereby limiting real-world performance of agentic tasks. In this paper, we propose a novel Non-Autoregressive Iterative Generation framework, called ToolACE-MT, for constructing high-quality multi-turn agentic dialogues. ToolACE-MT generates full conversational trajectories through three stages: coarse-grained initialization, iterative refinement, and offline verification. The initialization phase builds a structurally complete yet semantically coarse dialogue skeleton; the iterative refinement phase introduces realistic complexities and continued refinement via mask-and-fill operations; and the offline verification phase ensures correctness and coherence via rule- and model-based checks. Experiments demonstrate that ToolACE-MT enables efficient, effective and generalizable agentic data generation, offering a new paradigm for high-quality data construction in tool-augmented LLM scenarios.

large language model, machine learning, trajectory, (21 more...)

arXiv.org Artificial Intelligence

2508.12685

Country:

Asia > China (0.46)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

CONFETTI: Conversational Function-Calling Evaluation Through Turn-Level Interactions

Alkhouli, Tamer, Margatina, Katerina, Gung, James, Shu, Raphael, Zaghi, Claudia, Sunkara, Monica, Zhang, Yi

arXiv.org Artificial IntelligenceJun-3-2025

We introduce Conversational Function-Calling Evaluation Through Turn-Level Interactions (CONFETTI), a conversational benchmark1 designed to evaluate the function-calling capabilities and response quality of large language models (LLMs). Current benchmarks lack comprehensive assessment of LLMs in complex conversational scenarios. CONFETTI addresses this gap through 109 human-simulated conversations, comprising 313 user turns and covering 86 APIs. These conversations explicitly target various conversational complexities, such as follow-ups, goal correction and switching, ambiguous and implicit goals. We perform off-policy turn-level evaluation using this benchmark targeting function-calling. Our benchmark also incorporates dialog act annotations to assess agent responses. We evaluate a series of state-of-the-art LLMs and analyze their performance with respect to the number of available APIs, conversation lengths, and chained function calling. Our results reveal that while some models are able to handle long conversations, and leverage more than 20+ APIs successfully, other models struggle with longer context or when increasing the number of APIs. We also report that the performance on chained function-calls is severely limited across the models. Overall, the top performing models on CONFETTI are Nova Pro (40.01%), Claude Sonnet v3.5 (35.46%) and Llama 3.1 405B (33.19%) followed by command-r-plus (31.18%) and Mistral-Large-2407 (30.07%).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.01859

Country:

Asia (0.28)
North America (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Chitchat as Interference: Adding User Backstories to Task-Oriented Dialogues

Stricker, Armand, Paroubek, Patrick

arXiv.org Artificial IntelligenceJun-28-2024

During task-oriented dialogues (TODs), human users naturally introduce chitchat that is beyond the immediate scope of the task, interfering with the flow of the conversation. To address this issue without the need for expensive manual data creation, we use few-shot prompting with Llama-2-70B to enhance the MultiWOZ dataset with user backstories, a typical example of chitchat interference in TODs. We assess the impact of this addition by testing two models: one trained solely on TODs and another trained on TODs with a preliminary chitchat interaction. Our analysis demonstrates that our enhanced dataset poses a challenge for these systems. Moreover, we demonstrate that our dataset can be effectively used for training purposes, enabling a system to consistently acknowledge the user's backstory while also successfully moving the task forward in the same turn, as confirmed by human evaluation. These findings highlight the benefits of generating novel chitchat-TOD scenarios to test TOD systems more thoroughly and improve their resilience to natural user interferences

backstory, computational linguistic, dialogue, (13 more...)

arXiv.org Artificial Intelligence

2402.15248

Country:

Europe > Czechia > Prague (0.04)
Asia > Thailand (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(9 more...)

Genre: Research Report (0.64)

Industry: Transportation (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

WildChat: 1M ChatGPT Interaction Logs in the Wild

Zhao, Wenting, Ren, Xiang, Hessel, Jack, Cardie, Claire, Choi, Yejin, Deng, Yuntian

arXiv.org Artificial IntelligenceMay-2-2024

Chatbots such as GPT-4 and ChatGPT are now serving millions of users. Despite their widespread use, there remains a lack of public datasets showcasing how these tools are used by a population of users in practice. To bridge this gap, we offered free access to ChatGPT for online users in exchange for their affirmative, consensual opt-in to anonymously collect their chat transcripts and request headers. From this, we compiled WildChat, a corpus of 1 million user-ChatGPT conversations, which consists of over 2.5 million interaction turns. We compare WildChat with other popular user-chatbot interaction datasets, and find that our dataset offers the most diverse user prompts, contains the largest number of languages, and presents the richest variety of potentially toxic use-cases for researchers to study. In addition to timestamped chat transcripts, we enrich the dataset with demographic data, including state, country, and hashed IP addresses, alongside request headers. This augmentation allows for more detailed analysis of user behaviors across different geographical regions and temporal dimensions. Finally, because it captures a broad range of use cases, we demonstrate the dataset's potential utility in fine-tuning instruction-following models. WildChat is released at https://wildchat.allen.ai under AI2 ImpACT Licenses.

conference paper, dataset, iclr 2024, (14 more...)

arXiv.org Artificial Intelligence

2405.0147

Country:

North America > United States > California (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.69)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Is one brick enough to break the wall of spoken dialogue state tracking?

Druart, Lucas, Vielzeuf, Valentin, Estève, Yannick

arXiv.org Artificial IntelligenceDec-5-2023

In Task-Oriented Dialogue (TOD) systems, correctly updating the system's understanding of the user's needs (a.k.a dialogue state tracking) is key to a smooth interaction. Traditionally, TOD systems perform this update in three steps: transcription of the user's utterance, semantic extraction of the key concepts, and contextualization with the previously identified concepts. Such cascade approaches suffer from cascading errors and separate optimization. End-to-End approaches have been proved helpful up to the semantic extraction step. This paper goes one step further paving the path towards completely neural spoken dialogue state tracking by comparing three approaches: (1) a state of the art cascade approach, (2) a locally E2E approach with rule-based contextualization and (3) a completely neural approach.

cascade approach, dialogue, dialogue state tracking, (12 more...)

arXiv.org Artificial Intelligence

2311.04923

Country: Europe > France (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Are cascade dialogue state tracking models speaking out of turn in spoken dialogues?

Druart, Lucas, Jacqmin, Léo, Favre, Benoît, Rojas-Barahona, Lina Maria, Vielzeuf, Valentin

arXiv.org Artificial IntelligenceNov-3-2023

In Task-Oriented Dialogue (TOD) systems, correctly updating the system's understanding of the user's needs is key to a smooth interaction. Traditionally TOD systems are composed of several modules that interact with one another. While each of these components is the focus of active research communities, their behavior in interaction can be overlooked. This paper proposes a comprehensive analysis of the errors of state of the art systems in complex settings such as Dialogue State Tracking which highly depends on the dialogue context. Based on spoken MultiWoz, we identify that errors on non-categorical slots' values are essential to address in order to bridge the gap between spoken and chat-based dialogue systems. We explore potential solutions to improve transcriptions and help dialogue state tracking generative models correct such errors.

dst, non-categorical slot, transcription, (14 more...)

arXiv.org Artificial Intelligence

2311.04922

Country:

North America > United States > Idaho > Ada County > Boise (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Contextual Data Augmentation for Task-Oriented Dialog Systems

Axman, Dustin, Ray, Avik, Garg, Shubham, Huang, Jing

arXiv.org Artificial IntelligenceOct-16-2023

Alexa, Siri, Google assistant) are able to accomplish various tasks by interacting with them via natural language conversation. Task-oriented dialog models form the core technology behind these applications, which understands users' natural language utterances [1, 2], keeps track of the conversation [3, 4], performs requested tasks (e.g. API calls) [5, 6], and generates appropriate meaningful response to the user [7, 8]. Training neural task-oriented dialog models [9, 10, 11], requires a large amount of annotated data, which is difficult to obtain for model developers. While crowd-sourcing and dialog simulation based on agent interplay [12, 13] addresses this issue to a certain extent, these are slow and don't provide sufficient coverage of different natural language (NL) user turn surface form variations. Recently, large pre-trained language models (e.g. GPT-2 [14], T5 [15]) have been successfully used to generate fluent agent dialog responses, both with dialog context [16, 8, 17] or without it [18, 19]. However, it is unclear if similar models can capture the large variation of user turn distribution in such task-oriented dialogs. Previous work on data augmentation for spoken language understanding has largely focused on generating paraphrases of user utterance, with a specific goal and set of entities [20, 21, 22]. However, such utterances again fail to provide sufficient coverage of the large semantic space possible between dialog turns, and may not improve performance of downstream task-oriented dialog systems.

augmentation, augmentation model, dialog, (14 more...)

arXiv.org Artificial Intelligence

2310.1038

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.86)

Add feedback

User Modeling for Task Oriented Dialogues

Gur, Izzeddin, Hakkani-Tur, Dilek, Tur, Gokhan, Shah, Pararth

arXiv.org Artificial IntelligenceNov-11-2018

We introduce end-to-end neural network based models for simulating users of task-oriented dialogue systems. User simulation in dialogue systems is crucial from two different perspectives: (i) automatic evaluation of different dialogue models, and (ii) training task-oriented dialogue systems. We design a hierarchical sequence-to-sequence model that first encodes the initial user goal and system turns into fixed length representations using Recurrent Neural Networks (RNN). It then encodes the dialogue history using another RNN layer. At each turn, user responses are decoded from the hidden representations of the dialogue level RNN. This hierarchical user simulator (HUS) approach allows the model to capture undiscovered parts of the user goal without the need of an explicit dialogue state tracking. We further develop several variants by utilizing a latent variable model to inject random variations into user responses to promote diversity in simulated user responses and a novel goal regularization mechanism to penalize divergence of user responses from the initial user goal. We evaluate the proposed models on movie ticket booking domain by systematically interacting each user simulator with various dialogue system policies trained with different objectives and users.

machine learning, natural language, user turn, (18 more...)

arXiv.org Artificial Intelligence

1811.04369

Country: North America > United States > California (1.00)

Genre: Research Report (0.83)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback